Template-Based Information Extraction without the Templates

نویسندگان

  • Nathanael Chambers
  • Daniel Jurafsky
چکیده

Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Header Metadata Extraction from Semi-structured Documents Using Template Matching

With the recent proliferation of documents, automatic metadata extraction from document becomes an important task. In this paper, we propose a novel template matching based method for header metadata extraction form semi-structured documents stored in PDF. In our approach, templates are defined, and the document is considered as strings with format. Templates are used to guide finite state auto...

متن کامل

Can Wavelet Denoising Improve Motor Unit Potential Template Estimation?

Background: Electromyographic (EMG) signals obtained from a contracted muscle contain valuable information on its activity and health status. Much of this information lies in motor unit potentials (MUPs) of its motor units (MUs), collected during the muscle contraction. Hence, accurate estimation of a MUP template for each MU is crucial. Objective: To investigate the possibility of improv...

متن کامل

The LOLITA User-Definable Template Interface

The development of user-definable templates interfaces which allow the user to design new templates definitions in a user-friendly way is a new issue in the field of information extraction. The LOLITA user-definable templates interface allows the user to define new templates using sentences in natural language text with a few restrictions and formal elements. This approach is rather different f...

متن کامل

Web Template Extraction Based on Hyperlink Analysis

Web templates are one of the main development resources for website engineers. Templates allow them to increase productivity by plugin content into already formatted and prepared pagelets. For the final user templates are also useful, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important ...

متن کامل

Pii: S0031-3203(96)00086-6

-We propose an improved method for eye-feature extraction, descriptions, and tracking using deformable templates. Some existing algorithms are exploited to locate the initial position of eye features and then deformable templates are used for extracting and describing the eye features. Rather than using original energy minimization for matching the templates, the region-based approach is propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011